1 R Code and labels

There are two types of R code in R Markdown documents: R code chunks and inline R code. The syntax for inline R code is `r R_CODE`, and it can be embedded inline with other document elements. R code chunks look like plain code blocks but have {r} after the three backticks and (optionally) chunk options inside {}, e.g.,

```{r, eval = FALSE}
rn <- rnorm(1000, 100, 15)
(mean(rn))
(sd(rn))
```

When the previous code is evaluated, random numbers are generated from a \(\mathcal{N}(\mu = 100, \sigma = 15)\) distribution.

rn <- rnorm(1000, 100, 15)
(mean(rn))
[1] 100.4287
(sd(rn))
[1] 15.52064

The mean of rn rounded to two decimal places is 100.43, while the standard deviation of rn rounded to two decimal places is 15.52. In line codes `r round(mean(rn),2)` and `r round(sd(rn),2)` are used to compute and round the mean and standardard deviation of rn to two decimal places, respectively.

If you have used \(\LaTeX{}\), inserting beautiful mathematics in your documents should not be an issue. There are some differences when labeling an equation that are needed in bookdown (Xie, 2020) so that documents written in R Markdown render properly. To number an equation, enclose the equation in an equation environment and assign a label to the equation using the syntax (\#eq:label). Equation labels must start with the prefix eq: in bookdown. To refer to the equation, use the syntax \@ref(eq:label). For example:

\begin{equation}
\bar{x} = \sum_{i=1}^{n}\frac{x_i}{n}
(\#eq:xbar)
\end{equation}

renders equation (1.1).

\[\begin{equation} \bar{x} = \sum_{i=1}^{n}\frac{x_i}{n} \tag{1.1} \end{equation}\]

A slighlty more complex example showing the intermediate numbers used to compute the mean (from inline R code) for the values stored in rn is given below and rendered in (1.2).

\begin{equation}
\bar{x} = \sum_{i=1}^{n}\frac{x_i}{n} = \frac{`r sum(rn)`}{`r length(rn)`} = `r mean(rn)`
(\#eq:numbers)
\end{equation}
\[\begin{equation} \bar{x} = \sum_{i=1}^{n}\frac{x_i}{n} = \frac{1.0042869\times 10^{5}}{1000} = 100.4286852 \tag{1.2} \end{equation}\]

1.1 Graph with a caption

The following code was used to create Figure 1.1.

```{r, label = "histo", fig.cap = "Histogram of 1000 randomly generated values from a N(100, 15) distribution", echo = FALSE}
hist(rn, main = "", xlab = "", col = "pink")
```
Histogram of 1000 randomly generated values from a N(100, 15) distribution

Figure 1.1: Histogram of 1000 randomly generated values from a N(100, 15) distribution

When creating Figures, make sure to use a label inside your R code chunk. To refer to the Figure use the syntax \@ref(fig:label).

1.2 Tables

Table 1.1 was created using the R code chunk and code below. Note that to refer to Table 1.1, the name of the code chunk label is used. In this case, the code chunk label is FT; and to refer to Table 1.1, one uses the syntax \@ref(tab:FT).

```{r, label = "FT", echo = FALSE}
    knitr::kable(head(iris), booktabs = TRUE,  caption = 'The first six rows of `iris`')
```
Table 1.1: The first six rows of iris
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 setosa
5.0 3.6 1.4 0.2 setosa
5.4 3.9 1.7 0.4 setosa

2 Citations and creation of *.bib files

One way to create a *.bib file is to use zotero. It is also possible to create *.bib files for R and R packages using the following code:

```{r, echo = FALSE, results = "hide"}
# Create vector of packages you use
PackagesUsed <- c("ggplot2", "bookdown", "knitr", "base", "dplyr", "zoo", "plotly", "lubridate", "tidyverse", "readr")
# Write bib information---NOTE: packages.bib is stored in the working directory
knitr::write_bib(PackagesUsed, file = "./packages.bib")
# Load packages
lapply(PackagesUsed, library, character.only = TRUE)
```

The first three entries of the packages.bib file are shown below.

@Manual{R-base,
  title = {R: A Language and Environment for Statistical Computing},
  author = {{R Core Team}},
  organization = {R Foundation for Statistical Computing},
  address = {Vienna, Austria},
  year = {2018},
  url = {https://www.R-project.org/},
}
@Manual{R-bookdown,
  title = {bookdown: Authoring Books and Technical Documents with R Markdown},
  author = {Yihui Xie},
  year = {2020},
  note = {R package version 0.19},
  url = {https://CRAN.R-project.org/package=bookdown},
}
@Manual{R-dplyr,
  title = {dplyr: A Grammar of Data Manipulation},
  author = {Hadley Wickham and Romain François and Lionel Henry and Kirill Müller},
  year = {2020},
  note = {R package version 0.8.5},
  url = {https://CRAN.R-project.org/package=dplyr},
}

Citations go inside square brackets and are separated by semicolons. Each citation must have a key, composed of @ + the citation identifier from the database and may optionally have a prefix, a locator, and a suffix. See here for examples. This document relies on R (R Core Team, 2018) as well as many other packages. The citation (R Core Team, 2018) was created with [@R-base].

The first entry of the Alayna.bib file is shown below.

@article{goldberg_structure_1993,
    title = {The structure of phenotypic personality traits},
    issn = {0003-066X},
    url = {https://login.proxy006.nclive.org/login?url=http://search.ebscohost.com/login.aspx?direct=true&db=edsbig&AN=edsbig.A13605369&site=eds-live&scope=site},
    abstract = {This personal historical article traces the development of the Big-Five factor structure, whose growing acceptance by personality researchers has profoundly influenced the scientific study of individual differences. The roots of this taxonomy lie in the lexical hypothesis and the insights of Sir Francis Galton, the prescience of L. L. Thurstone, the legacy of Raymond B. Cattell, and the seminal analyses of Tupes and Christal. Paradoxically, the present popularity of this model owes much to its many critics, each of whom tried to replace it, but failed. In reaction, there have been a number of attempts to assimilate other models into the five-factor structure. Lately, some practical implications of the emerging consensus can be seen in such contexts as personnel selection and classification.},
    number = {n1},
    urldate = {2020-05-10},
    journal = {The American Psychologist},
    author = {Goldberg, Lewis R.},
    year = {1993},
    note = {Publisher: American Psychological Association, Inc.},
    keywords = {Personality -- Research, Personality assessment -- Models, Phenotype -- Research},
    file = {EBSCO Full Text:/Users/alan/Zotero/storage/IWRNVJUA/Goldberg - 1993 - The structure of phenotypic personality traits.pdf:application/pdf}
}

Using the zotero *.bib file

The next paragraph illustrates different citations using the *.bib file Alayna.bib.

The structure of phenotypic personality traits is addressed in Goldberg (1993). The role of the medial frontal cortex in cognitive control and its ramifications are great (see Goldberg, 1993, pp. 3–4; Hooker et al., 2008, ch.2; Ridderinkhof et al., 2004, pp. 3–5). Goldberg (1993, p. 4) says the phenotypic personality trait is out of this world.

The citations Goldberg (1993), (see Goldberg, 1993, pp. 3–4; Hooker et al., 2008, ch.2; Ridderinkhof et al., 2004, pp. 3–5), and Goldberg (1993, p. 4) were created with @goldberg_structure_1993, [see @goldberg_structure_1993, pp. 3-4; @ridderinkhof_role_2004, pp. 3-5; @hooker_influence_2008, ch.2], and @goldberg_structure_1993 [p. 4].

The YAML

Citations will only work if your *.bib files are included in the YAML. Note in the example below that there are two *.bib files. By default, pandoc will use a Chicago author-date format for citations and references. To use another style, you will need to specify a CSL 1.0 style file in the csl metadata field. In the example below, the apa.csl is used. CSL files may be downloaded from https://www.zotero.org/styles. Make sure to store the CSL file in your working directory or specify the path to the CSL file in the metadata field of your YAML.

  • Discuss how to specify a path!
---
title: "Some Bookdown Features"
author: "Alan T. Arnholt"
date: 'Last compiled: `r format(Sys.time(), "%B %d, %Y")`'
output: bookdown::html_document2
bibliography: [Alayna.bib, packages.bib]
link-citations: true
csl: apa.csl
---

3 A ggplot2/plotly graph

Figure 3.1 is a plotly graph that was converted from a ggplot2 graph. Make sure to credit package authors when using packages. In this section, we use ggplot2 (Wickham et al., 2020) and plotly (Sievert et al., 2020) to create Figure 3.1.

p1 <- ggplot(data = mtcars, aes(x = disp, y = mpg)) + 
  geom_point(aes(color = as.factor(cyl))) + 
  geom_smooth(formula = y~x, se = FALSE) +
  labs(color = "Cylinders") +
  theme_bw()
p2 <- ggplotly(p1)
p2

Figure 3.1: The standard ggplot2 scatter plot

4 Covid-19 data

Note: The majority of the covid code and ideas were taken from https://www.sharpsightlabs.com/blog/.

Obtaining the data compiled by the Center for Systems Science and Engineering (CSSE) team at John Hopkins.

today <- Sys.Date()  # return current date as class Date
# will use (today - 1) as download files are generally one day behind actual dates
# CSSEGIS files are actually updated once a day around 23:59 (UTC)
url <- 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv'
covid_data_RAW <- read_csv(url)
# rename
covid_data_RAW %>% 
  rename('subregion' = `Province/State` 
         ,'country' = 'Country/Region'
         ,'lat' = 'Lat'
         ,'long' = 'Long'
         ) ->
  covid_data
# Reshape data
covid_data %>% 
  pivot_longer(cols = -one_of('country', 'subregion', 'lat', 'long'),
               names_to = 'date', 
               values_to = 'confirmed') ->
  covid_data
# reorder columns
# NOTE: everything(): Matches all variables.
covid_data %>% 
  select(country, subregion, everything()) ->
  covid_data
# convert date
# labridate::mdy()  month-day-year string to R dates
covid_data %>% 
  mutate(date = mdy(date)) ->
  covid_data
#
covid_data %>% 
  select(country, date, confirmed) %>% 
  filter(country %in% c('US', 'Spain', 'United Kingdom', 'Italy')) %>% 
  filter(date == (today - 1)) %>% 
  group_by(country, date) %>% 
  summarise(total_confirmed = sum(confirmed))
# A tibble: 4 x 3
# Groups:   country [4]
  country        date       total_confirmed
  <chr>          <date>               <dbl>
1 Italy          2020-06-14          236989
2 Spain          2020-06-14          243928
3 United Kingdom 2020-06-14          297342
4 US             2020-06-14         2094058
covid_data %>% 
  filter(date == (today - 1)) %>% 
  select(country, confirmed) %>% 
  group_by(country) %>% 
  summarise(confirmed = sum(confirmed)) %>% 
  arrange(desc(confirmed)) %>% 
  top_n(10) %>% 
  ggplot(aes(y = fct_reorder(country, confirmed), x = confirmed)) +
    geom_bar(stat = 'identity', fill = "lightblue", color = "blue") +
    labs(y = '') + 
    theme_bw()

4.1 Verify numbers with COVID-19 Dashboard

COVID-19 Dashboard by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University.

4.2 Grabbing all of the important data

We need to get two additional streams of data, the “deaths” and “recovered” cases, and wrangle those data sets into tidy tibbles. To do this we will write functions that follow the basic structure for tidying the “confirmed” cases. There will be a suite a functions to rename columns, to “gather” data, to convert dates, to rearrange columns, to read csv files, and to merge data.

4.2.1 Rename columns

covid_rename_columns <- function(input_data){
  input_data %>% 
    dplyr::rename('subregion' = 'Province/State',
                  'country' = 'Country/Region',
                  'lat' = 'Lat',
                  'long' = 'Long') ->
    output_data
return(output_data)
}

4.2.2 Gather data

covid_pivot_data <- function(input_data, value_var_name){
  input_data %>% 
    pivot_longer(cols = -one_of('country','subregion','lat','long'),
                 names_to = 'date',
                 values_to = value_var_name) ->
    output_data
  return(output_data)
}

4.2.3 Convert dates

covid_convert_dates <- function(input_data){
  input_data %>% 
    dplyr::mutate(date = mdy(date)) ->
    output_data
  return(output_data)
}

4.2.4 Rearrange data

covid_rearrange_data <- function(input_data){
  input_data %>% 
    dplyr::select(country, subregion, date, lat, long, everything()) %>% 
    dplyr::arrange(country, subregion, date) ->
    output_data
  return(output_data)
}

4.2.5 Get and wrangle data

covid_get_data <- function(input_url, value_var_name){
  covid_data_inprocess <- read_csv(input_url)
  covid_data_inprocess <- covid_rename_columns(covid_data_inprocess)
  covid_data_inprocess <- covid_pivot_data(covid_data_inprocess, value_var_name)
  covid_data_inprocess <- covid_convert_dates(covid_data_inprocess)
  covid_data_inprocess <- covid_rearrange_data(covid_data_inprocess)
  return(covid_data_inprocess)
}

4.2.6 Get data

url_confirmed = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv'
url_deaths = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv'
url_recovered = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv'
covid_confirmed <- covid_get_data(url_confirmed, 'confirmed')
covid_deaths <- covid_get_data(url_deaths, 'dead')
covid_recovered <- covid_get_data(url_recovered, 'recovered')

4.2.7 Merge data

Next the three files (covid_confirmed, covid_deaths, covid_recovered) will be merged and duplicate columns will be dropped.

covid_confirmed %>% 
  left_join(covid_deaths, on = c(country, subregion, date)) %>% 
  left_join(covid_recovered, on = c(country, subregion, date)) ->
  covid_data

4.2.8 Add new cases and daily deaths

covid_data %>% 
  arrange(country, subregion, date) %>% 
  group_by(country, subregion) %>% 
  mutate(new_cases = confirmed - lag(confirmed),
         daily_dead = dead - lag(dead)) %>% 
  ungroup() ->
  covid_data

4.3 Visual data exploration

4.3.1 Scatter plot

Note: To arrive at the total confirmed or dead for a country on a given date, one will need to sum over all subregions in each country. For example, the United Kingdom will have eleven subregions in its total.

covid_data %>% 
  filter(date == (today - 1)) %>% 
  select(country, confirmed, dead, recovered) %>% 
  group_by(country) %>% 
  summarise(dead = sum(dead), 
            confirmed = sum(confirmed)) %>% 
  ggplot(aes(x = confirmed, y = dead)) +
    geom_point(alpha = 0.4, aes(color = country)) +
    geom_smooth(se = FALSE, size = 0.5, color = "pink") +
    guides(color = FALSE) +
    theme_bw() -> p1
library(plotly)
p2 <- ggplotly(p1)
p2

4.3.2 Barplot

What follows is ugly!

covid_data %>% 
  filter(date == (today - 1)) %>% 
  select(country, confirmed) %>% 
  group_by(country) %>% 
  summarise(confirmed = sum(confirmed)) %>% 
  arrange(desc(confirmed)) %>% 
  top_n(10) %>% 
  ggplot(aes(x = country, y = confirmed)) +
    geom_bar(stat = 'identity', fill = 'black') 

Fixing up and switching axes…

covid_data %>% 
  filter(date == (today - 1)) %>% 
  select(country, confirmed) %>% 
  group_by(country) %>% 
  summarise(confirmed = sum(confirmed)) %>% 
  arrange(desc(confirmed)) %>% 
  top_n(10) %>% 
  ggplot(aes(y = fct_reorder(country, confirmed), x = confirmed)) +
    geom_bar(stat = 'identity', fill = 'gray', color = "black") +       
    labs(y = "") +
    theme_bw()

4.3.3 Line Charts

Line chart of covid-19 cases versus time excluding China.

covid_data %>% 
  filter(country != 'China') %>% 
  group_by(date) %>% 
  summarise(confirmed = sum(confirmed)) %>% 
  ggplot(aes(x = date, y = confirmed)) +
    geom_line(color = 'red') +
    theme_bw()

covid_data %>% 
  group_by(date) %>% 
  summarise(tot_dead = sum(dead)) %>% 
  ggplot(aes(x = date, y = tot_dead)) +
    geom_line(color = 'red') +
    theme_bw()
World wide reported COVID-19 deaths by date

Figure 4.1: World wide reported COVID-19 deaths by date

covid_data %>% 
  filter(country %in%  c('US', 'Spain', 'Russia', "United Kingdom", "Italy", "Brazil")) %>% 
  group_by(country, date) %>% 
  summarise(confirmed = sum(confirmed)) %>% 
  ggplot(aes(x = date, y = confirmed)) +
    geom_line(color = 'red') +
    facet_wrap(~country) +
    theme_bw() + 
    labs(title = "Total confirmed cases by date")

covid_data %>% 
  filter(country %in%  c('US', 'Spain', 'Russia', "United Kingdom", "Italy", "Brazil")) %>% 
  group_by(country, date) %>% 
  summarise(confirmed = sum(confirmed)) %>% 
  ggplot(aes(x = date, y = confirmed)) +
    geom_line(color = 'red') +
    facet_wrap(~country, scales = 'free') +
    theme_bw() + 
    labs(title = "Total confirmed cases by date")

covid_data %>% 
  filter(country %in%  c('US', 'Spain', 'Russia', "United Kingdom", "Italy", "Brazil")) %>% 
  group_by(country, date) %>% 
  summarise(total_dead = sum(dead)) %>% 
  ggplot(aes(x = date, y = total_dead)) +
    geom_line(color = 'red') +
    facet_wrap(~country) +
    theme_bw() + 
    labs(title = "Total deaths by date")

covid_data %>% 
  filter(country %in%  c('US', 'Spain', 'Russia', "United Kingdom", "Italy", "Brazil")) %>% 
  group_by(country, date) %>% 
  summarise(total_dead = sum(dead)) %>% 
  ggplot(aes(x = date, y = total_dead)) +
    geom_line(color = 'red') +
    facet_wrap(~country, scales = "free") +
    theme_bw() + 
    labs(title = "Total deaths by date")

covid_data %>% 
  filter(country %in%  c('US', 'Spain', 'Russia', "United Kingdom", "Italy", "Brazil")) %>% 
  group_by(country, date) %>% 
  summarise(confirmed = sum(confirmed)) %>% 
  ggplot(aes(x = date, y = confirmed, color = country)) +
    geom_line() + 
  scale_color_manual(values = c("pink", "orange", "gray", "red", "lightblue", "purple")) +
    theme_bw() + 
    labs(title = "Total confirmed cases by date")

4.4 Computing and graphing a seven day rolling mean

covid_data %>% 
  filter(country %in% c('US', 'Spain', 'Russia', "United Kingdom", "Italy", "Brazil")) %>%
  group_by(country, date) %>% 
  summarise(new_cases = sum(new_cases), daily_dead = sum(daily_dead)) %>% 
  mutate(new_case_rollmean_7 = rollmean(new_cases, k = 7, fill = NA, align = "right"), 
         daily_dead_rollmean = rollmean(daily_dead, k = 7, fill = NA, align = "right")) %>% 
  select(country, date, new_cases, new_case_rollmean_7, daily_dead_rollmean) %>% 
  ggplot(aes(x = date, y = new_case_rollmean_7)) +
    geom_line(aes(color = country)) +
    labs(title = 'Covid19 New Cases\n7 day rolling avg') + 
    facet_wrap(~country) + 
    guides(color = FALSE) + 
    theme_bw()

covid_data %>% 
  filter(country %in% c('US', 'Spain', 'Russia', "United Kingdom", "Italy", "Brazil")) %>%
  group_by(country, date) %>% 
  summarise(new_cases = sum(new_cases), daily_dead = sum(daily_dead)) %>% 
  mutate(new_case_rollmean_7 = rollmean(new_cases, k = 7, fill = NA, align = "right"), 
         daily_dead_rollmean = rollmean(daily_dead, k = 7, fill = NA, align = "right")) %>% 
  select(country, date, new_cases, new_case_rollmean_7, daily_dead_rollmean) %>% 
  ggplot(aes(x = date, y = new_case_rollmean_7)) +
    geom_line(aes(color = country)) +
    labs(title = 'Covid19 New Cases\n7 day rolling avg') + 
    facet_wrap(~country, scales = "free") + 
    guides(color = FALSE) + 
    theme_bw()

covid_data %>% 
  filter(country %in% c('US', 'Spain', 'Russia', "United Kingdom", "Italy", "Brazil")) %>%
  group_by(country, date) %>% 
  summarise(new_cases = sum(new_cases), daily_dead = sum(daily_dead)) %>% 
  mutate(new_case_rollmean_7 = rollmean(new_cases, k = 7, na.pad = TRUE, align = "right"), 
         daily_dead_rollmean = rollmean(daily_dead, k = 7, na.pad = TRUE, align = "right")) %>% 
  select(country, date, new_cases, new_case_rollmean_7, daily_dead_rollmean) %>% 
  ggplot(aes(x = date, y = daily_dead_rollmean)) +
    geom_line(aes(color = country)) +
    labs(title = 'Covid19 Daily Dead\n7 day rolling avg') + 
    facet_wrap(~country) + 
    guides(color = FALSE) + 
    theme_bw()

covid_data %>% 
  filter(country %in% c('US', 'Spain', 'Russia', "United Kingdom", "Italy", "Brazil")) %>%
  group_by(country, date) %>% 
  summarise(new_cases = sum(new_cases), daily_dead = sum(daily_dead)) %>% 
  mutate(new_case_rollmean_7 = rollmean(new_cases, k = 7, na.pad = TRUE, align = "right"), 
         daily_dead_rollmean = rollmean(daily_dead, k = 7, na.pad = TRUE, align = "right")) %>% 
  select(country, date, new_cases, new_case_rollmean_7, daily_dead_rollmean) %>% 
  ggplot(aes(x = date, y = daily_dead_rollmean)) +
    geom_line(aes(color = country)) +
    labs(title = 'Covid19 Daily Dead\n7 day rolling avg') + 
    facet_wrap(~country, scales = "free") + 
    guides(color = FALSE) + 
    theme_bw()

4.5 Getting World Population data

https://datahub.io/core/population#r

library(jsonlite)
json_file <- 'https://datahub.io/core/population/datapackage.json'
json_data <- fromJSON(paste(readLines(json_file), collapse=""))

# get list of all resources:
print(json_data$resources$name)
[1] "validation_report"      "population_csv"         "population_json"       
[4] "population_zip"         "population_csv_preview" "population"            
# print all tabular data(if exists any)
for(i in 1:length(json_data$resources$datahub$type)){
  if(json_data$resources$datahub$type[i]=='derived/csv'){
    path_to_file = json_data$resources$path[i]
    data <- read.csv(url(path_to_file))
    # print(data)
  }
}
pop_data <- data %>% 
  filter(Year == max(data$Year))  # Get most current year population
pop_data_rename <- pop_data %>% 
  dplyr::rename('country' = 'Country.Name', 'population' = 'Value') %>% 
  mutate(country = replace(as.vector(country), country == "United States", "US")) %>% 
  mutate(country = replace(as.vector(country), country == "Russian Federation", "Russia")) %>%
  select(country, population)
head(pop_data_rename)
                                      country population
1                                  Arab World  419790588
2                      Caribbean small states    7358965
3              Central Europe and the Baltics  102511922
4                  Early-demographic dividend 3249140605
5                         East Asia & Pacific 2328220870
6 East Asia & Pacific (excluding high income) 2081651801
###
covid_data_pop <- covid_data %>% 
    left_join(pop_data_rename, on = c(country)) 
head(covid_data_pop)
# A tibble: 6 x 11
  country subregion date         lat  long confirmed  dead recovered new_cases
  <chr>   <chr>     <date>     <dbl> <dbl>     <dbl> <dbl>     <dbl>     <dbl>
1 Afghan… <NA>      2020-01-22    33    65         0     0         0        NA
2 Afghan… <NA>      2020-01-23    33    65         0     0         0         0
3 Afghan… <NA>      2020-01-24    33    65         0     0         0         0
4 Afghan… <NA>      2020-01-25    33    65         0     0         0         0
5 Afghan… <NA>      2020-01-26    33    65         0     0         0         0
6 Afghan… <NA>      2020-01-27    33    65         0     0         0         0
# … with 2 more variables: daily_dead <dbl>, population <dbl>

4.5.1 Density

covid_data_pop %>% 
  filter(country %in%  c('US', 'Spain', 'Russia', "United Kingdom", "Italy", "Brazil")) %>% 
  group_by(country, date) %>% 
  mutate(cd = (confirmed/population)) %>% 
  summarise(cd = sum(cd)) %>%
  ggplot(aes(x = date, y = cd)) +
    geom_line(color = 'red') +
    facet_wrap(~country) +
    theme_bw() + 
    labs(title = "Total confirmed cases/population (density) by date")

covid_data_pop %>% 
  filter(country %in%  c('US', 'Spain', 'Russia', "United Kingdom", "Italy", "Brazil")) %>% 
  group_by(country, date) %>% 
  mutate(dd = (dead/population)) %>% 
  summarise(dd = sum(dd)) %>%
  ggplot(aes(x = date, y = dd)) +
    geom_line(color = 'red') +
    facet_wrap(~country) +
    theme_bw() + 
    labs(title = "Total dead/population (death density) cases by date")

4.5.2 Exporting Data

if (!dir.exists("data")) dir.create("data")
write_csv(covid_data, path = "./data/covid_data.csv")
write_csv(covid_data_pop, path = "./data/covid_data_pop.csv")

5 Exercises

  1. Create a new R Markdown document and read in the covid_data stored in the data directory using the read_csv() function from the readr package (Wickham et al., 2018).

  2. Make the y-axis of Figure 4.1 log base 10. See scale_y_log10(). Match the colors for points and lines used in the logarithmic graph of total deaths in the bottom right of the Johns Hopkins covid panel (Logarithmic tab).

  3. Create a bar chart of the COVID 19 daily cases similar to the one reported in the bottom right of the Johns Hopkins covid panel (Daily Cases tab). Discuss why some of the daily reported new cases are negative. Hint: counting methods, also consider this one

  4. Change the style file in the csl metadata field of the YAML to your favorite journal. Note you will have to download possibly from https://www.zotero.org/styles and upload the appropriate csl file. Use zotero to find three articles in your field of study. Export from zotero *.bib file (name the file articulos.bib) of the three articles. Practive citing the articles in your R Markdown document.

  5. Add figure captions for the line chart in exercise 2 and the barchart in exercise 3.


References

Goldberg, L. R. (1993). The structure of phenotypic personality traits. The American Psychologist, n1. https://login.proxy006.nclive.org/login?url=http://search.ebscohost.com/login.aspx?direct=true&db=edsbig&AN=edsbig.A13605369&site=eds-live&scope=site

Hooker, C., D’Esposito, M., Knight, R. T., Miyakawa, A., & Verosky, S. C. (2008). The Influence of Personality on Neural Mechanisms of Observational Fear and Reward Learning. https://doi.org/10.1016/j.neuropsychologia.2008.05.005

R Core Team. (2018). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/

Ridderinkhof, K. R., Ullsperger, M., Crone, E. A., & Nieuwenhuis, S. (2004). The role of the medial frontal cortex in cognitive control. Science, 5695, 443. https://login.proxy006.nclive.org/login?url=http://search.ebscohost.com/login.aspx?direct=true&db=edsgsc&AN=edsgcl.123933807&site=eds-live&scope=site

Sievert, C., Parmer, C., Hocking, T., Chamberlain, S., Ram, K., Corvellec, M., & Despouy, P. (2020). Plotly: Create interactive web graphics via ’plotly.js’. https://CRAN.R-project.org/package=plotly

Wickham, H., Chang, W., Henry, L., Pedersen, T. L., Takahashi, K., Wilke, C., Woo, K., Yutani, H., & Dunnington, D. (2020). Ggplot2: Create elegant data visualisations using the grammar of graphics. https://CRAN.R-project.org/package=ggplot2

Wickham, H., Hester, J., & Francois, R. (2018). Readr: Read rectangular text data. https://CRAN.R-project.org/package=readr

Xie, Y. (2020). Bookdown: Authoring books and technical documents with r markdown. https://CRAN.R-project.org/package=bookdown